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Abstract 

Syntactic models should be descriptively ade- 
quate and parsable. A syntactic description is 
autonomous in the sense that it has certain ex- 
plicit formal properties. Such a description re- 
lates to the semantic interpretation of the sen- 
tences, and to the surface text. As the formal- 
ism is implemented in a broad-coverage syntac- 
tic parser, we concentrate on issues that must 
be resolved by any practical system that uses 
such models. The correspondence between the 
structure and linear order is discussed. 

1 Introduction 

The aim of this paper is to define a dependency 
grammar framework which is both linguistically 
motivated and computationally parsable. 

A linguistically adequate grammar is the pri- 
mary target because if we fail to define a de- 
scriptive grammar, its application is less use- 
ful for any linguistically motivated purposes. In 
fact, our understanding of the potential bene- 
fits of the linguistic means can increase only if 
our practical solutions stand on an adequate de- 
scriptive basis. 

Traditionally, grammatical models have been 
constructed by linguists without any consider- 
ation for computational application, and later, 
by computationally oriented scientists who have 
first taken a parsable mathematical model and 
then forced the linguistic description into the 
model which has usually been too weak to de- 
scribe what a linguist would desire. 

Our approach is somewhere between these 
two extremes. While we define the grammar 
strictly in linguistic terms, we simultaneously 



test it in the parsing framework. What is excep- 
tional here is that the parsing framework is not 
restricted by an arbitrary mathematical model 
such as a context-free phrase structure gram- 
mar. This leads us to a situation where the 
parsing problem is extremely hard in the gen- 
eral, theoretical case, but fortunately parsable 
in practise. Our result shows that, while in gen- 
eral we have an NP-hard parsing problem, there 
is a specific solution for the given grammar that 
can be run quickly. Currently, the speed of the 
parsing systemP] is several hundred words per 
second. 

In short, the grammar should be empirically 
motivated. We have all the reason to believe 
that if a linguistic analysis rests on a solid de- 
scriptive basis, analysis tools based on the the- 
ory would be more useful for practical purposes. 
We are studying the possibilities of using com- 
putational implementation as a developing and 
testing environment for a grammatical formal- 
ism. We refer to the computational implemen- 
tation of a grammar as a parsing grammar. 

1.1 Adequacy 

A primary requirement for a parsing grammar is 
that it is descriptively adequate. Extreme dis- 
tortion results if the mathematical properties 
of the chosen model restrict the data. How- 
ever, this concern is not often voiced in the dis- 



* in CoLing-ACL'98 workshop Processing of 
Dependency-based Grammars, Kahane and Polguere 
(eds), p. 1-10, Montreal, Canada, 1998 



cussion. For example, McCawley (1982 , p. 92) 
notes that such a basic assumption concerning 
linguistic structures that "strings are more basic 
than trees and that trees are available only as a 
side product of derivations that operate in terms 
of strings''' was attributable to the historical ac- 
cident that early transformational grammarians 
knew some automata theory but no graph the- 
ory." 



1 Demo: \ittp:/ /www, conexor.fi/ analysers, htm, 



One reason for computationally oriented syn- 
tacticians to favour restricted formalisms is that 
they are easier to implement. Those who began 
to use dependency models in the 1960's largely 
ignored descriptive adequacy in order to develop 
models which were mathematically simple and, 
as a consequence, for which effective parsing al- 
gorithms could be presented. These inadequa- 
cies had to be remedied from the beginning, 
which resulted in ad hoc theories or engineering 
solutions^ without any motivation in the theory. 

There have been some serious efforts to re- 
solve these problems. Hudson ( 1989| ), for exam- 
ple, has attempted to construct a parser that 
would reflecs the claims of the theory (Word 
Grammar) as closely as possible. However, it 
seems that even linguistically ambitious depen- 
dency theories, such as Hudson's Word Gram- 
mar, contain some assumptions which are at- 
tributable to certain mathematical properties of 
an established formalism rather than imposed 
by the linguistic data|^. These kinds of unwar- 
ranted assumptions tend to focus the discus- 
sion on phenomena which are rather marginal, 
if a complete description of a language is con- 
cerned. No wonder that comprehensive descrip- 
tions, such as Quirk et al. (1985 ), have usually 
been non-formal. 

1.2 The European structuralist 
tradition 

We argue for a syntactic description that is 
based on dependency rather than constituency, 



and we fully agree with Hajicova (1993 , p. 1) 
that "making use of the presystemic insights of 
classical European linguistics, it is then possi- 
ble that constituents may be dispensed with as 
basic elements of (the characterization of) the 
sentence structure. " However, we disagree with 
the notion of "presystemic" if it is used to imply 
that earlier work is obsolete. From a descriptive 
point of view, it is crucial to look at the data 
that was covered by earlier non-formal gram- 
marians. 

As far as syntactic theory is concerned, there 
is no need to reinvent the wheel. Our de- 
scription has its basis in the so-called "classi- 



cal model" based on the work of the French 
linguist Lucien Tesniere. His structural model 
should be capable of describing any occurring 
natural language. His main work, ( |1959| ) ad- 
dresses a large amount of material from typo- 
logically different languages. It is indicative of 
Tesniere's empirical orientation that there are 
examples from some 60 languages, though his 
method was not empirical in the sense that he 
would have used external data inductively. As 
[Heringer (1996 ) points out, Tesniere used data 
merely as an expository device. However, in 
order to achieve formal rigour he developed a 
model of syntactic description, which obviously 
stems from the non-formal tradition developed 
since antiquity but without compromising the 
descriptive needs. We give a brief historical 
overview of the formal properties inherent in 
Tesniere's theory in Section [B| before we proceed 
to the implementational issues in Section ^. 

1.3 The surface syntactic approach 

We aim at a theoretical framework where we 
have a dependency theory that is both descrip- 
tively adequate and formally explicit. The lat- 
ter is required by the broad-coverage parsing 
grammar for English that we have implemented. 
We maintain the parallelism between the syn- 
tactic structure and the semantic structure in 
our design of the syntactic description: when a 
choice between alternative syntactic construc- 
tions in a specific context should be made, the 
semantically motivated alternative is selected^. 

Although semantics determines what kind of 
structure a certain sentence should have, from 
the practical point of view, we have a completely 
different problem: how to resolve the syntactic 
structure in a given context. Sometimes, the 
latter problem leads us back to redefine the syn- 
tactic structure so that it can be detected in 
the sentence^]. Note, however, that this redef- 



2 See discussion of an earlie r engineering art in apply- 
ing a dependency grammar in Kettunen (1994). 



3 For instance, the notion of adjacency was redefined 
in WG, but was still unsuitable for "free" word order 



languages. 



4 In such sentence as "I asked John to go home", the 
noun before the infinitive clause is analysed as the (se- 
mantic) subject of the infinitive rather than as a com- 
plement of the governing verb. 

For instance, detecting the distinct roles of the to- 
infinitive clause in the functional roles of the purpose 
or reason is usually difficult (e.g. Quirk et al. (1985| , 
p. 564): "Why did he do it?; purpose: "To relieve his 
anger" and reason: "Because he was angry"). In such 
sentence as "A man came to the party to have a good 
time", the interpretation of the infinitive clause depends 
on the interaction of the contextual and lexical semantics 



inition is now made on a linguistic basis. In 
order to achieve parsability, the surface descrip- 
tion should not contain elements which can not 
be selected by using contextual information. It 
is important that the redefinition should not be 
made because an arbitrary mathematical model 
denies e.g. crossing dependencies between the 
syntactic elements. 

2 Constituency vs. dependency 

A central idea in American structuralism was 
to develop rigorous mechanical procedures, 
i.e. "discovery procedures" , which were assumed 
to decrease the grammarians' own, subjective 
assessment in the induction of the grammars. 



This practice was culminated in Harris (1960 
p. 5), who claimed that "the main research 
of descriptive linguistics, and the only rela- 
tion which will be accepted as relevant in the 
present survey, is the distribution or arrange- 
ment within the flow of speech of some parts or 
features relative to others. " 

The crucial descriptive problem for a distri- 
butional grammar (i.e. phrase-structure gram- 
mar) is the existence of non-contiguous ele- 
ments. The descriptive praxis of some earlier IC 
theoricians allows discontiguous constituents. 
For example, already Wells (1947) discussed the 
problem at length and defined a restriction for 
discontiguous constituents^. Wells' restriction 
implies that a discontiguous sequence can be 
a constituent only if it appears as a contigu- 
ous sequence in another context. This means 
that Wells' characterisation of a constituent de- 
fines an element which is broadly equivalent to 
the notion of bunch in Tesniere's ( 1959j ) the- 
ory. Consequently, these two types of grammars 
are capable of describing the equivalent syntac- 
tic phenomena and share the assumption that 
a syntactic structure is compatible with its se- 
mantic interpretation. However, the extended 
constituent grammar thus no longer provides a 
rigorous distributional basis for a description, 
and its formal properties are unknown. 



rat her than a str uctural distinction. 

6 Wells (1947 ): "A discontinuous sequence is a con- 
stituent if in some environment the corresponding contin- 
uous sequence occurs as a constituent in a construction 
semantically harmonious with the constructions in which 
the given discontinuous sequence occurs. " Further, Wells 
notes that "The phrase semantically harmonious is left 
undefined, and will merely be elucidated by examples." 



We can conclude our argument by stating 
that the reason to reject constitutional gram- 
mars is that the formal properties for descrip- 
tively adequate constitutional grammars are not 
known. In the remaining sections, we show that 
a descriptively adequate dependency model can 
be constructed so that it is formally explicit and 
parsable. 

3 Parallelism between the syntactic 
and semantic structures 

Obviously, distributional descriptions that do 
not contribute to their semantic analysis can 
be given to linguistic strings. Nevertheless, 
the minimal descriptive requirement should be 
that a syntactic description is compatible with 
the semantic structure. The question which 
arises is that if the correspondence between 
syntactic and semantic structures exists, why 
should these linguistic levels be separated. For 
example, Sgall (1992 , p. 278) has questioned 



the necessity of the syntactic level altogether. 
His main argument for dispensing with the 
whole surface syntactic level is that there are 
no strictly synonymous syntactic constructions, 
and he therefore suggests that the surface word 
order belongs more properly to the level of mor- 
phemics. This issue is rather complicated. We 
agree that surface word order does not belong 
to syntactic structure, but for different reasons. 



In contradistinction to S gall's claim, Mel'cuk 



1987| , p. 33) has provided some evidence where 



the morphological marker appears either in the 
head or the dependent element in different lan- 
guages, as in the Russian "kniga professor+a" 
(professor's book) and its Hungarian equivalent 
"professzor konyv+e". Consequently, Mel'cuk 
|(1987 , p. 108) distinguishes the morphological 
dependency as a distinct type of dependency. 
Thus morphology does not determine the syn- 
tactic dependency, as Tesniere (195£ , Ch. 15) 
also argues. 

|Tcsniere (1959 , Ch. 20:17) meaning 



For 



(Fr. sens) and structure are, in principle, inde- 
pendent. This is backed by the intuition that 
one recognises the existence of the linguistic 
structures which are semantically absurd, as il- 
lustrated by the structural similarity between 
the nonsensical sentence "Le silence vertebral 
indispose la voie licite" and the meaningful sen- 
tence "Le signal vert indique la voie libre". 



The independence of syntactic and seman- 
tic levels is crucial for understanding Tesniere's 
thesis that the syntactic structure follows from 
the semantic structure, but not vice versa. This 
means that whenever there is a syntactic rela- 
tion, there is a semantic relation (e.g. comple- 
mentation or determination) going in the op- 
posite direction. In this view, the syntactic 
head requires semantic complementation from 
its dependents. Only because the syntactic and 
semantic structures belong to different levels 
is there no interdependency or mutual depen- 
dency, though the issue is sometimes raised in 
the literature. 

There is no full correspondence between the 
syntactic and semantic structures because some 
semantic relations are not marked in the func- 
tional structure. In [Tesniere (1959 , p. 85), for 



example, there are anaphoric relations, seman- 
tic relations without correspondent syntactic re- 
lations. 

4 Surface representation and 
syntactic structure 

4.1 The nucleus as a syntactic primitive 

The dependency syntactic models are inher- 
ently more "word oriented" than constituent- 
structure models, which use abstract phrase cat- 
egories. The notion of word, understood as an 
orthographic unit in languages similar to En- 
glish, is not the correct choice as a syntactic 
primitive. However, many dependency theo- 
ries assume that the orthographic words directly 
correspond^ to syntactic primitives (nodes in 
the trees). Although the correspondence could 
be very close in languages like English, there are 
languages where the word-like units are much 
longer (i.e. incorporating languages). 

Tesniere observed that because the syntactic 
connexion implies a parallel semantic connex- 
ion, each node has to contain a syntactic and a 
semantic centre. The node element, or nucleus, 
is the genuine syntactic primitive. There is no 
one-to-one correspondence between nuclei and 
orthographic words, but the nucleus consists of 
one or more, possibly discontiguous, words or 
parts of words. The segmentation belongs to 
the linearisation, which obeys language-specific 
rules. Tesniere ( 1959] , Ch 23:17) argued that 
the notion word, a linear unit in a speech-chain, 



7 See [Kunze (1975j p. 491) and |Hudson (199l| ) 



does not belong to syntactic description at all. 
A word is nothing but a segment in the speech 
chain (|l959i Ch 10:3). 

The basic element in syntactic description is 
the nucleus. It corresponds to a node in a de- 
pendency tree. When the sentence is repre- 
sented as a dependency tree, the main node con- 
tains the whole verb chain. 

There are at least two reasons why the con- 
cept of the nucleus is needed. In the first place, 
there are no cross-linguistically valid criteria 
to determine the head in, say, a prepositional 
phrase. One may decide, arbitrarily, that ei- 
ther the preposition or the noun is the head of 
the construction. Second, because the nucleus 
is also the basic semantic unit, it is the minimal 
unit in a lexicographical description. 

4.2 Linearisation 

Tesniere makes a distinction between the linear 
order, which is a one-dimensional property of 
the physical manifestations of the language, and 
the structural order, which is two-dimensional. 
According to his conception, constructing the 
structural description is converting the linear 
order into the structural order. Restricting him- 
self to syntactic description, Tesniere does not 
formalise this conversion though he gives two 
main principles: (1) usually dependents either 
immediately follow or precede their heads (pro- 
jectivity) and when they do not, (2) additional 
devices such as morphological agreement can in- 
dicate the connexion. 

Although Tesniere's distinction between the 
linear and structural order corresponds to some 
extent with the distinction between the linear 
precedence (LP) and the immediate dominance, 
there is a crucial difference in emphasis with re- 
spect to those modern syntactic theories, such 
as GPSG, that have distinct ID and LP compo- 
nents. Tesniere excludes word order phenom- 
ena from his structural syntax and therefore 
does not formalise the LP component at all. 
Tesniere's solution is adequate, considering that 
in many languages word order is considerably 
free. This kind of "free" word order means that 
the alternations in the word order do not neces- 
sarily change the meaning of the sentence, and 
therefore the structural description implies sev- 
eral linear sequences of the words. This does 
not mean that there are no restrictions in the 
linear word order but these restrictions do not 



emerge in the structural analysis. 

In fact, Tesniere assumes that a restriction 
that is later formalised as an adjacency princi- 
ple characterizes the neutral word order when he 
says that there are no syntactic reasons for vio- 
lating adjacency in any language, but the prin- 
ciple can be violated, as he says, for stylistic 
reasons or to save the metric structure in poet- 
ics. If we replace the stylistic reasons with the 
more broader notion which comprises the dis- 
course functions, his analysis seems quite con- 
sistent with our view. Rather than seeing that 
there are syntactic restrictions concerning word 
order, one should think that some languages due 
to their rich morphology have more freedom in 
using word order to express different discourse 
functions. Thus, linearisation rules are not for- 
mal restrictions, but language-specific and func- 
tional. 

There is no need for constituents. Tesniere's 
theory has two mechanisms to refer to linearisa- 
tion. First, there are static functional categories 
with dynamic potential to change the initial cat- 
egory. Thus, it is plausible to separately define 
the combinatorial and linearisation properties of 
each category. Second, the categories are hierar- 
chical so that, for instance, a verb in a sentence 
governs a noun, an adverb or an adjective. The 
lexical properties, inherent to each lexical ele- 
ment, determine what the governing elements 
are and what elements are governed. 

There are no simple rules or principles for 
linearisation. Consider, for example, the treat- 
ment of adjectives in English. The basic rule is 
that attributive adjectives precede their heads. 
However, there are notable exceptions, includ- 
ing the postmodified adjectives^, which follow 
their heads, and some lexical exceptions^, which 
usually or always are postmodifying. 

5 Historical formulations 

In this section, the early formalisations of 
the dependency grammar and their relation to 
Tesniere's theory are discussed. The depen- 
dency notion was a target of extensive formal 
studies already in the first half of the 1960's^]. 



5.1 Gaifman's formulation 

The classical studies of the formal properties 
of dependency grammar are iGaifman (19651 ) 
and Hays (1964| ) | rT | , which demonstrate that de- 
pendency grammar of the given type is weakly 
equivalent to the class of context-free phrase- 
structure grammars. The formalisation of de- 



pendency grammars is given in Gaifman (1965 



p. 305): For each category X, there will be a fi- 
nite number of rules of the type X{Y\ , Y% ■ ■ ■ Y\ * 
Yi + \ ■ ■ ■ Y n ), which means that Y\- ■ -Y n can de- 
pend on X in this given order, where X is to 
occupy the position of *. 

Hays, referring to Gaifman's formulation 
above, too strongly claims that "[djependency 
theory is weakly omnipotent to IC theory. The 
proof is due to Gaifman, and is too lengthy to 
present here. The consequence of Gaifman's 
theorem is that the class of sets of utterances [...] 
is Chomsky's class of context-free languages." 
This claim was later taken as granted to ap- 
ply to any dependency grammar, and the first, 
often cited, attestation of this apparently false 
claim appeared in Robinson (1970Q . She pre- 
sented four axioms of the theory and claimed 
they were advocated by Tesniere and formalised 
by Hays and Gaifman. 

Thus, the over-all result of the Gaifman- 



Example: "It is a phenomenon consistent with ..." 
3 Example: "president elect " 

3 A considerable number of the earlier studies were 



listed by Vlarcus (1967, p. 263), who also claimed that 
"Tesniere was one of the first who used (dependency) 



graphs in syntax. His ideas were repeated, developed and 
precised by Y. Lecerf & P. Ihm (1960), L. Hirschberg and 
I. Lynch, particularly by studying syntactic projectivity 
and linguistic subtrees. " 

1 Tesniere is not mentioned in these papers. Gaif- 
man's paper describes the results ". . . obtained while 
the author was a consultant for the RAND Corporation 
in the summer of 1960." Whereas phrase-structure sys- 
tems were defined by referring to Chomsky's Syntactic 
Structures, the corresponding definition for the depen- 
dency systems reads as follows: "By dependency system 
we mean a system, containing a finite number of rules, by 
which dependency analysis for certain language is done, 
as described in certain RAND publications (Hays, Febru- 
ary 1960; Hays and Ziehe, Apri l 1960) ." Speaking of the 
dependency theory, Hays (196C ) refers to the Soviet work 
on machine transl ation using the dependency theory of 
Kulagina et al. In Hays (1964), the only linguistic refer- 
ence is to the 1961 edition of Hjelmslev's Prolegomena: 
"Some of Hjelmslev's empirical principles are closely re- 
lated to the insight behind dependency theory, but em- 
pirical dependency in his sense cannot be identified with 
abstract dependency in the sense of the present paper, 
since he explicitly differentiates dependencies from other 
kinds of relations, whereas the present theory intends to 
be complete, i. e. to account for all relations among units 
of utterances. " 



Hays proof was that there is a weak equiv- 
alence of dependency theory and context-free 
phrase-structure grammars. This weak equiva- 
lence means only that both grammars charac- 
terize the same sets of strings. Unfortunately, 
this formulation had little to do with Tesniere's 
dependency theory, but as this result met the re- 
quirements of a characterisation theory, interest 
in the formal properties of dependency grammar 
diminished considerably. 

5.2 Linguistic hypotheses 



Tesniere's Hypothesis, as Marcus (1967 ) calls it, 
assumes that each element has exactly one head. 
Marcus also formulates a stronger hypothesis, 
the Projectivity hypothesis, which connects the 
linear order of the elements of a sentence to the 
structural order of the sentence. The hypoth- 
esis is applied in the following formulation: let 
x = a\CL2 ■ ■ ■ cn . . . a n be a sentence, where and 
dj are terms in the sentence. If the term a; is 
subordinate to the term a,j, and there is an in- 
dex k which holds min(i,j) < k < max(i,j), 
then the term is subordinate to the term cij. 

This is the formal definition of projectivity, 
also known as adjacency or planarity. The intu- 
itive content of adjacency is that modifiers are 
placed adjacent to their heads. The intuitive 
content behind this comes from Behaghel's First 
Law[3 ( [Siewierska, 19881 , p. 143). 

The adjacency principle is applicable only if 
the linear order of strings is concerned. How- 
ever, the target of Tesniere's syntax is struc- 
tural description and, in fact, Tesniere discusses 
linear order, a property attributable to strings, 
only to exclude linearisation from his concep- 
tion of syntax. This means that a formalisa- 
tion which characterises sets of strings can not 
even be a partial formalisation of Tesniere's the- 
ory because his syntax is not concerned with 



strings, but structures. Recently, Neuhaus and 



Broker (1997 ) have studied some formal prop- 



erties of dependency grammar, observing that 
Gaifman's conception is not compatible either 
with Tesniere's original formulation or with the 
"current" variants of DG. 

There are several equivalent formalisations 
for this intuition. In effect they say that in a 



12 "The most important law is that what belongs to- 
gether mentally (semantically) is placed close together 
syntactically. " 
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Figure 1: Non-projective dependency tree 



syntactic tree, where words are printed in lin- 
ear order, the arcs between the words must not 
cross. For example, in our work, as the arc be- 
tween the node "what" and the node "do" in 
Figure [l] violates the principle, the construction 
is non-projective. 

5.3 Formal properties of a 
Tesniere-type DG 

Our current work argues for a dependency 
grammar that is conformant with the original 
formulation in Tesniere (1959) and contains the 
following axioms: 



The primitive element of syntactic descrip- 
tion is a nucleus. 

Syntactic structure consists of connexions 
between nuclei. 



Connexion ( Tesniere, 1959| , Ch. 1:11) is a 
binary functional relation between a supe- 
rior term (regent) and inferior term (depen- 
dent). 

Each nucleus is a node in the syntactic tree 
and it has exactly one regent ( rrcsnicrcT 



195£, Ch. 3:1) 



• A regent, which has zero or more depen- 
dents, represents the whole subtree. 

• The uppermost regent is the central node 
of the sentence. 

These axioms define a structure graph which 
is acyclic and directed, i.e. the result is a tree. 
These strong empirical claims restrict the the- 
ory. For example, multiple dependencies and all 
kinds of cyclic dependencies, including mutual 



dependency, are excluded. In addition, there 
can be no isolated nodes. 

However, it is not required that the structure 
be projective, a property usually required in 
many formalised dependency theories that do 
not take into account the empirical fact that 
non-projective constructions occur in natural 
languages. 

6 The Functional Dependency 
Grammar 

Our parsing system, called the Functional De- 
pendency Grammar (FDG), contains the follow- 
ing parts: 

• the lexicon, 

• the CG-2 morphological disambiguation 
( [Voutilaincn, 1995 : Tapanaincn, 1996| ), and 



the Functional Dependency Grammar ([Ta- 



panainen and Jarvinen, 
Tapanaincn. 1997] ). 



1997: Jarvinen and 



6.1 On the formalism and output 

It has been necessary to develop an expressive 
formalism to represent the linguistic rules that 
build up the dependency structure. The de- 
scriptive formalism developed by Tapanainen 
can be used to write effective recognition gram- 
mars and has been used to write a comprehen- 
sive parsing grammar of English. 

When doing fully automatic parsing it is 
necessary to address word-order phenomena. 
Therefore, it is necessary that the grammar for- 
malism be capable of referring simultaneously 
both to syntactic order and linear order. Obvi- 
ously, this feature is an extension of Tesniere's 
theory, which does not formalise linearisation. 
Our solution, to preserve the linear order while 
presenting the structural order requires that 
functional information is no longer coded to the 
canonical order of the dependent sPl 

In the FDG output, the functional informa- 
tion is represented explicitly using arcs with la- 



13 <Jompare tins solution with the Prague approach, 
Which uhoh horizontal ordering aa a formal device to ex- 

press the topic-focus articulation at their tectogrammat- 
ical level. The mapping from the tectogrammatical level 
to the linear order req uires separate rules, called shallow 
rules ( Petkevic, 1987 ). Before such a description exists, 
one can not make predictions concerning the complexity 
of the grammar. 
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Figure 2: "The dog was running in the house" 



bels of syntactic functions. Currently, some 30 
syntactic functions are applied. 

To obtain a closer correspondence with the 
semantic structure, the nucleus format corre- 
sponding to Tesniere's stemmas is applied. It 
is useful for many practical purposes. Con- 
sider, for example, collecting arguments for a 
given verb "RUN" . Having the analysis such as 
those illustrated in Figure it is easy to ex- 
cerpt all sentences where the governing node is 
verbal having a main element that has "run" 
as the base form, e.g. ran, "was running" (Fig- 
ure §), "did run" (Figure ||). The contraction 
form "won't run" obtains the same analysis (the 
same tree although the word nuclei can contain 
extra information which makes the distinction) 
contraction of the words "will not run". 
As the example shows, orthographic words were 
segmented whenever required by the syntactic 
analysis. 

This solution did not exist prior the FDG 
and generally is not possible in a monostratal 
dependency description, which takes the (or- 
thographic) words as primitives. The problem 
is that the non-contiguous elements in a verb- 
chain are assigned into a single node while the 
subject in between belongs to its own node. 

For historical reasons, the representation con- 
tains a lexico-functional level closely similar to 
the syntactic analysis of the earlier English Con- 
straint Grammar (ENGCG) ( [Karlsson et al. 



1995 ) parsing system. The current FDG for- 



malism overcomes several shortcomings]^] of the 
earlier approaches: (1) the FDG does not rely 
on the detection of clause boundaries, (2) pars- 
ing is no longer sequential, (3) ambiguity is rep- 



Listed in Voutilaincn (1994| ). 
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Figure 3: "Did the dog run in the house" 



Figure 4: Coordinated elements 



resented at the clause level rather than word 
level, (4) due to explicit representation of de- 



pendency structure, there is no need to refer to 
phrase-like units. Because the FDG rule formal- 
ism is more expressive, linguistic generalisation 
can be formalised in a more transparent way, 
which makes the rules more readable. 



7 Descriptive solutions 
7.1 Coordination 

We now tackle the problem of how coordination 
can be represented in the framework of depen- 
dency model. For example, [Hudson (199l[) has 
argued that coordination is a phenomenon that 
requires resorting to a phrase-structure model. 

Coordination should not be seen as a directed 
functional relation, but instead special con- 
nexion between two functionally equal elements. 
The coordination connexions are called junc- 
tions in Tesniere (1959, Chs. 134-150). Tesniere 



considered junctions primarily mechanism 
to pack multiple sentences economically into 
one. Unfortunately, his solution, which repre- 
sents all coordinative connexions in stemmas, 
is not adequate, because due to cyclic arcs the 
result is no longer a tree. 

Our solution is to pay due respect to the for- 
mal properties of the dependency model, which 
requires that each element should have one and 
only one head.0 This means that coordinated 
elements are chained (Figure 0) using a specific 



The tre atment of coordination and gapping in fKa- 



hane (1997) resembles ours in simple cases. However, 



this model maintains projectivity, and consequently, 
both multiple heads and extended nuclei, which are es- 
sentially phrase-level units, are used in complex c ases, 
making the model broadly similar to Hudson (1991). 



arc for coordination (labeled as cc). The coordi- 
nators are mostly redundant markers ( [Tesniere" 



1959[ Ch. 39:5)0, especially, they do not have 



any (governing) role in the syntactic structure 
as they do in many word-based forms of depen- 
dency theory (e.g. Kunze (1975| ) and Mel'cuk 
TT987D). 



Unlike the other arcs in the tree, the arc 
marking coordination does not imply a depen- 
dency relation but rather a functional equiva- 
lence. If we assume that the coordinated el- 
ements have exactly the same syntactic func- 
tions, the information available is similar to that 
provided in Tesniere's representation. If needed, 
we can simply print all the possible combina- 
tions of the coordinated elements: "Bill loves 
Mary", "John loves Mary", etc. 

7.2 Gapping 

It is claimed that gapping is even a more se- 
rious problem for dependency theories, a phe- 
nomenon which requires the presence of non- 
terminal nodes. The treatment of gapping, 
where the main verb of a clause is missing, fol- 
lows from the treatment of simple coordination. 

In simple coordination, the coordinator has 
an auxiliary role without any specific function 
in the syntactic tree. In gapping, only the coor- 
dinator is present while the verb is missing. One 
can think that as the coordinator represents all 
missing elements in the clause, it inherits all 
properties of the missing (verbal) elements (Fig- 
ure |6|). This solution is also computationally 



16 The redundancy is shown in the existence of asyn- 
detic coordination. As syntactic markers, coordinators 
are not completely void of semantic content, which is 
demonstrated by the existence of contrasting set of co- 
ordinators; 'and', 'or', 'but' etc. 



<John> 

"John" N SG @SUBJ subj:>2 

<gave> 

"give" V PAST @+FV #2 main:>0 
<the> 

"the" DET ART SG/PL @DN> det:>4 
<lecture> 

"lecture" N SG @OBJ #4 obj:>2 
<on> 

"on" PREP @ADVL #5 tmp:>2 
< Tuesday > 

"Tuesday" N SG @<P pcomp:>5 
<and> 

"and" CC @CC #7 cc:>2 
<Bill> 

"Bill" N SG @SUBJ subj:>7 

<on> 

"on" PREP @ADVL #9 tmp:>7 
<Wednesday> 

"Wednesday" N SG @<P pcomp:>9 

<•> 

Figure 5: Text-based representation 



effective because we do not need to postulate 
empty nodes in the actual parsing system. 

From a descriptive point of view there is no 
problem if we think that the coordinator ob- 
tains syntactic properties from the nucleus that 
it is connected to. Thus, in a sentence with ver- 
bal ellipsis, e.g. in the sentence "Jack painted 
the kitchen white and the living room blue", the 
coordinator obtains the subcategorisation prop- 
erties of a verb. A corresponding graph is seen 
in Figure |(| 

Due to 'flatness' of dependency model, there 
is no problem to describe gapping where a sub- 
ject rather than complements are involved, as 
the Figure || shows. Note that gapping provides 
clear evidence that the syntactic element is a 
nucleus rather than a word. For example, in 
the sentence "Jack has been lazy and Jill an- 
gry", the elliptic element is the verbal nucleus 
has been. 

8 Conclusion 

This paper argues for a descriptively adequate 
syntactic theory that is based on dependency 
rather than constituency. Tesniere's theory 



seems to provide a useful descriptive framework 
for syntactic phenomena occurring in various 
natural languages. We apply the theory and 
develop the representation to meet the require- 
ments of computerised parsing description. Si- 
multaneously, we explicate the formal proper- 
ties of Tesniere's theory that are used in con- 
structing a practical parsing system. 

A solution to the main obstacle to the utilisa- 
tion of the theory, the linearisation of the syn- 
tactic structure, is presented. As a case study, 
we reformulate the theory for the description 
of coordination and gapping, which are difficult 
problems for any comprehensive syntactic the- 
ory. 
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